Grid-ODF: Detecting Outliers Effectively and Efficiently in Large Multi-dimensional Databases
نویسندگان
چکیده
Outlier detection is an important task in data mining that enjoys a wide range of applications such as detections of credit card fraud, criminal activity and exceptional patterns in databases. In recent years, there have been numerous research work in outlier detection and the new notions such as distance-based outliers and density-based local outliers have been proposed. However, the existing outlier detection algorithms suffer the drawbacks that they are inefficient in dealing with large multi-dimensional datasets and most of them are only able to capture certain kinds of outliers. In this paper, we will propose a novel outlier mining algorithm, called Grid-ODF, that takes into account both the local and global perspectives of outliers for effective detection. The notion of Outlying Degree Factor (ODF), that reflects the factors of both the density and distance, is introduced to rank outliers. A grid structure partitioning the data space is employed to enable Grid-ODF to be implemented efficiently. Experimental results show that Grid-ODF outperforms existing outlier detection algorithms such as LOF and KNN-distance in terms of effectiveness and efficiency.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملDetecting High-Dimensional Outliers: the New Task, Algorithms and Performance
Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...
متن کاملDetecting Outliers in Exponentiated Pareto Distribution
In this paper, we use two statistics for detecting outliers in exponentiated Paretodistribution. These statistics are the extension of the statistics for detecting outliers inexponential and gamma distributions. In fact, we compare the power of our test statisticsbased on the simulation study and identify the better test statistic for detecting outliers inexponentiated Pareto distribution. At t...
متن کاملRobust Subspace Approaches for Analyzing Incomplete Synchrophasor Measurements
Synchrophasor measurements can significantly enhance the monitorability of the power grid by revealing the dynamics of grid operation. However, due to high-rate samples collected in large volume, big data challenges emerge to efficiently process the data. The present work advocates robust subspace approaches including robust principal component analysis and subspace clustering, to identify low-...
متن کاملOutlier detection for high dimensional data pdf
Is particularly useful for high dimensional data where outliers cannot be found.High dimensional data in Euclidean space pose special challenges to data. In about just the last few years, the task of unsupervised outlier detection has found.Outlier detection is an outstanding data mining task referred to open pdf with mac word class="text" href="https://tokiqivy.files.wordpress.com/2015/06/opel...
متن کامل